{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![alt](https://raw.githubusercontent.com/callysto/callysto-sample-notebooks/master/notebooks/images/Callysto_Notebook-Banner_Top_06.06.18.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## A Glimpse Into the Future: Interactive Textbooks\n",
    "\n",
    "Jupyter Notebooks are easy to maintain, keep current and evergreen. \n",
    "\n",
    "Multiple Jupyter Notebooks can be combined to create Interactive Textbooks.\n",
    "\n",
    "For example, _The Foundations of Data Science_ class at UC Berkeley's [Interactive Textbook](https://ds8.gitbooks.io/textbook/content/).\n",
    "https://ds8.gitbooks.io/textbook/content/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# NHL Data\n",
    "\n",
    "This notebook highlights how to use and work with open data using Jupyter notebooks in comparison to a more traditional approach of using standard, desktop tools to perform an open data assignment. \n",
    "\n",
    "The goal of the exercise is to use current National Hockey League (NHL) results to determine whether a team is on pace for making the playoffs. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Traditional approach\n",
    "\n",
    "### Tool 1\n",
    "Traditionally, students would have had to go to a particular website to access the data:\n",
    "    http://www.hockey-reference.com/teams/CGY/2018_games.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<img src=\"images/cgy_standings.png\" width=\"800px\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Tool 2\n",
    "From there, they would have to manually copy and paste the data into a tool such as Microsoft Excel. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<img src=\"images/cgy_excel.png\" width=\"100%\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<img src=\"images/cgy_excel_graph.png\" width = 80% />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Tool 3\n",
    "...that is then copied and pasted into Microsoft Word in order to write up a final report. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<img src =\"images/cgy_word.png\" width = 80% />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "In total, that means the students would need to use the following tools:\n",
    "- a web browser\n",
    "- Microsoft Excel or something like it\n",
    "- Microsoft Word or something similar\n",
    "\n",
    "The final product is usually a static snapshot in time. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Jupyter notebooks approach\n",
    "\n",
    "Using Jupyter notebooks, the entire analysis can be done in one tool, requiring only a web browser. The end product is an interactive notebook that combines active code along with the explanatory narrative for how the analysis was conducted which can be interpreted by anyone. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "import urllib.request\n",
    "import pandas as pd\n",
    "from bs4 import BeautifulSoup\n",
    "from argparse import ArgumentParser\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "# Query the hockey-reference website for data\n",
    "html1 = urllib.request.urlopen(\"https://www.hockey-reference.com/teams/CGY/2017_games.html\").read()\n",
    "html2 = urllib.request.urlopen(\"https://www.hockey-reference.com/teams/VAN/2017_games.html\").read()\n",
    "soup1 = BeautifulSoup(html1,\"html5lib\")\n",
    "soup2 = BeautifulSoup(html2,\"html5lib\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "table1 = soup1.find_all('table')[0]\n",
    "table_body1 = table1.find('tbody')\n",
    "rows1 = table_body1.find_all('tr')\n",
    "table2 = soup2.find_all('table')[0]\n",
    "table_body2 = table2.find('tbody')\n",
    "rows2 = table_body2.find_all('tr')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "column_headers = [ch.getText() for ch in table1.find_all('tr')[0].find_all('th')]\n",
    "#print(column_headers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "team1_data = [[td1.getText() for td1 in rows1[i].find_all(['th','td'])]\n",
    "            for i in range(len(rows1))]\n",
    "team2_data = [[td2.getText() for td2 in rows2[i].find_all(['th','td'])]\n",
    "            for i in range(len(rows2))]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "df1 = pd.DataFrame(team1_data, columns=column_headers)\n",
    "df2 = pd.DataFrame(team2_data, columns=column_headers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "df1 = df1.drop(df1.index[[20,41,62,83]])\n",
    "df2 = df2.drop(df2.index[[20,41,62,83]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# Extracted and cleaned data from the hockey-reference website\n",
    "print(df1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "cols = ['GP','W', 'OL']\n",
    "df1_clean = df1[cols].apply(pd.to_numeric, errors='coerce')\n",
    "df2_clean = df2[cols].apply(pd.to_numeric, errors='coerce')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "df1_clean['Playoff_Pace']=df1_clean['GP']*96/82\n",
    "df1_clean['CGY_Points']=df1_clean['W']*2 + df1_clean['OL']\n",
    "df2_clean['VAN_Points']=df2_clean['W']*2 + df2_clean['OL']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "df1_clean['VAN_Points']=df2_clean['VAN_Points']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# Data analysis of my two favorite hockey teams\n",
    "df_combined=df1_clean\n",
    "df_combined=df_combined.drop(['W','OL'],axis=1)\n",
    "print(df_combined)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# Calgary Flames Points Pace\n",
    "plt.plot( 'GP', 'Playoff_Pace', data=df_combined, marker='', color='black', linewidth=4);\n",
    "plt.plot( 'GP', 'CGY_Points', data=df_combined, marker='', color='red', linewidth=4, linestyle='dashed', label=\"CGY\");\n",
    "plt.xlabel('GP');\n",
    "plt.ylabel('Points');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# Calgary Flames and Vancouver Canucks Points Pace\n",
    "plt.plot( 'GP', 'Playoff_Pace', data=df_combined, marker='', color='black', linewidth=4);\n",
    "plt.plot( 'GP', 'CGY_Points', data=df_combined, marker='', color='red', linewidth=4, linestyle='dashed', label=\"CGY\");\n",
    "plt.plot( 'GP', 'VAN_Points', data=df_combined, marker='', color='blue', linewidth=4, linestyle='dashed', label=\"VAN\");\n",
    "plt.xlabel('GP');\n",
    "plt.ylabel('Points')\n",
    "plt.legend();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### In conclusion, the Calgary Flames and Vancouver Canucks were on pace (at some point) to make the playoffs last year 😜 \n",
    "\n",
    "I can save this analysis as a snapshot in time and I can also re-run this analysis next year in the _same_ Jupyter notebook to see how the results have changed."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](https://raw.githubusercontent.com/callysto/callysto-sample-notebooks/master/notebooks/images/Callysto_Notebook-Banners_Bottom_06.06.18.jpg)"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
